Agnostic System Identification for Model-Based Reinforcement LearningSupplementary Material

نویسندگان

  • Stephane Ross
  • Andrew Bagnell
چکیده

Additional Notation: We first introduce additional notation not used in the paper that is useful in some proofs. In particular, we define dω,π the distribution of states at time t if we executed π from time step 1 to t−1, starting from distribution ω at time 1, and dω,π = (1 − γ) ∑∞ t=1 γ dω,π the discounted distribution of states over the infinite horizon if we follow π, starting in ω at time 1.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Agnostic System Identification for Model-Based Reinforcement Learning

A fundamental problem in control is to learn a model of a system from observations that is useful for controller synthesis. To provide good performance guarantees, existing methods must assume that the real system is in the class of models considered during learning. We present an iterative method with strong guarantees even in the agnostic case where the system is not in the class. In particul...

متن کامل

Agnostic System Identification for Monte Carlo Planning

While model-based reinforcement learning is often studied under the assumption that a fully accurate model is contained within the model class, this is rarely true in practice. When the model class may be fundamentally limited, it can be difficult to obtain theoretical guarantees. Under some conditions the DAgger algorithm promises a policy nearly as good as the plan obtained from the most accu...

متن کامل

Using BELBIC based optimal controller for omni-directional threewheel robots model identified by LOLIMOT

In this paper, an intelligent controller is applied to control omni-directional robots motion. First, the dynamics of the three wheel robots, as a nonlinear plant with considerable uncertainties, is identified using an efficient algorithm of training, named LoLiMoT. Then, an intelligent controller based on brain emotional learning algorithm is applied to the identified model. This emotional l...

متن کامل

Closed-form analytical solution procedure for element design in D regions

This paper presents a novel procedure for solving the equations system of the rotating crack model used for reinforced concrete. It is implemented in the programme NonOPt where it is used to optimise the reinforcement design of D regions. The procedure is based on solving explicit closed-form relations without the need to incrementally increase the applied loads. The solution procedure is based...

متن کامل

Reinforcement learning based feedback control of tumor growth by limiting maximum chemo-drug dose using fuzzy logic

In this paper, a model-free reinforcement learning-based controller is designed to extract a treatment protocol because the design of a model-based controller is complex due to the highly nonlinear dynamics of cancer. The Q-learning algorithm is used to develop an optimal controller for cancer chemotherapy drug dosing. In the Q-learning algorithm, each entry of the Q-table is updated using data...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2012